home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Ian & Stuart's Australian Mac: Not for Sale
/
Another.not.for.sale (Australia).iso
/
hold me in your arms
/
Media Lab
/
Project List 9⁄93
< prev
next >
Wrap
Text File
|
1993-12-15
|
72KB
|
2,195 lines
RESEARCH PROJECTS IN THE MEDIA
LABORATORY
I. LEARNING & COMMON SENSE 1
1. Children and Machines 1
2. Memory-Based Representation 1
3. Understanding News 1
4. Iconic Stream-Based Video Logging
2
5. Storyteller Systems 2
6. FRAMER: Knowledge Description
and Sharing 2
7. Graphics by Example 3
8. Graphics for Software
Visualization 3
9. The Berlin Wall of Programming 3
10. Intelligent Technical
Documentation 3
11. Graphical Annotation 3
12. Instructible Agents 3
13. Agent-Application Communication
4
14. Autonomous Agents 4
15. Interface Agents 4
16. Editors, Agents, and Butlers 4
17. Society of Mind 5
18. Animal Construction Kits 5
19. Structure out of Sound 5
20. Constructionism 6
21. Robot Design Competitions 6
22. Project Headlight 6
23. Learning in Multicultural
Settings 6
24. Science and Whole Learning
Teachers' Collaborative 6
25. Electronic Communication 7
26. Children as Designers 7
27. Games 7
28. Study of Mathematical Thinking7
29. Thinking and Learning about
Systems 7
30. Ubiquitous Computing for Kids 7
31. New Visions of Programming in
Education 8
32. Learning in Virtual Communities
8
II. PERCEPTUAL COMPUTING 8
33. Mid-Level Vision 8
34. X-Y-T Image Analysis 8
35. Analysis of Egomotion Using Wide
Angle Vision 8
36. Modeling and Tracking People 9
37. Dynamic Scene Annotation 9
38. Multimodal Natural Dialog 9
39. Advanced Interactive Mapping
Displays 9
40. Information Appliances 10
41. Structure out of Sound 10
42. Looking at People 10
43. Model-Based Image Coding 10
44. Video Databases: Indexing by
Content 10
45. Image Query by Texture Content11
46. Nonlinear Space-Time Texture
Models 11
47. Semantic Image Modeling 11
48. Computers and Telephony 11
49. Desktop Audio 11
50. Voice Interfaces to Hand-Held
Computers 12
51. Voice Hypermedia 12
52. Telephone-Based Voice Services12
53. Synthetic Performers 12
54. Synthetic Listeners 12
55. Synthetic Spaces 12
56. Cognitive Audio Processing 13
57. Structured Audio Transmission13
III. INFORMATION & ENTERTAINMENT 13
58. Salient Stills 13
59. Color Semantics 13
60. Knowing the Individual 13
61. Interactive Computation of
Holographic Images 14
62. Scaled-Up Holographic Video 14
63. Holographic Laser Printer 14
64. Immersive Projected-Image
Holographic Displays 14
65. Medical Image Holography 14
66. Edge-Lit Holograms 15
67. Open Architecture Television 15
68. Cheops: Data-Flow Television
Receiver 15
69. Motion Modeling for Video Coding
15
70. Production, Distribution, and
Viewing of Structured Video
Narratives 16
71. Multimedia Testbed 16
72. Computationally Expressive Tools
16
73. Large-Scale, High-Resolution
Display Prototypes 18
74. Input/Output Considerations 18
75. Advanced Interactive Mapping
Displays 18
76. Experiments in Elastic Media 19
77. Video Editing: Computational
Partnerships 19
78. Stories with a Sense of
Themselves 20
79. Directing Digital Video: New
Tools 21
80. Storyteller Systems 21
81. Production, Distribution, and
Viewing of Structured Video
Narratives 21
82. Real-Time Modeling 21
83. Interface Sensors and
Transducers 22
84. Information, Computation, and
Physics 22
85. Incremental Coding 23
86. Movies via Modems 23
87. Objective Coding 23
88. Dimensionalization 23
89. Casual Collaboration 24
90. Structure out of Sound 24
91. Hyperinstruments 24
RESEARCH
The ongoing research of the Media
Laboratory extends across a wide realm
of activities, which may be clustered
into three broad areas: LEARNING &
COMMON SENSE, PERCEPTUAL COMPUTING,
and INFORMATION & ENTERTAINMENT.
I. LEARNING & COMMON SENSE
1. Children and Machines (Professor
Edith Ackermann)
Several projects involve children's
conceptions of machines. One project
focuses specifically on how young
children describe and understand the
functioning of simple machines.
Another project focuses on
descriptions of cybernetic machines
that interact with their environments.
A major interest is in how children
think about such machines, whether
they see them as "creatures" or as
"things."
2. Memory-Based Representation
(Professor Kenneth Haase)
We are developing an alternative
account of representation where the
structure of knowledge and cognition
emerges from the connection of current
descriptions to past situations and
not from some a priori framework into
which situations and experience are
translated. Artificial Intelligence
and Cognitive Science traditionally
assume that one's representation
(one's encoding of experience)
determines the structure of memory; we
are exploring models of memory where
this determination goes in both
directions. Descriptions are stored in
memory by connecting them with
descriptions already recorded and
noting the residual differences
unexplained by the connections made.
In this way, what is stored in memory
has a significant effect on how future
descriptions are encoded and stored.
3. Understanding News
(Professor Kenneth Haase)
We are applying our memory-based
representation systems to
comprehending, filtering, and
summarizing news stories. News stories
taken from various wire services and
other sources are run through a simple
parser which annotates the text with
phrase boundaries and possible
relationships between phrases. This
annotated text is then passed to the
memory-based representation system and
"understood" by identification of and
connection with similar stories
already in memory; preferences and
queries are interpreted as partial
stories which match incoming or
recorded descriptions. Comparison of
such understood texts with texts
previously read by a user allows user-
specific summarization of new articles
based on the real differences between
articles. In addition to filtering
incoming daily news, these tools
provide an interface to large text
databases and other sorts of databases
(e.g., images and video segments)
annotated with textual descriptions.
One strategic advantage of this
approach is that in the worst case, it
does as well as keyword matching -
similar words indicate similar
articles - yet in the best case it
does as well as a human editor or
selector.
4. Iconic Stream-Based Video Logging
(Professor Kenneth Haase)
Media Streams is an iconic logging
system for video content which
provides the descriptions used by
storyteller systems, archival
retrieval programs, content-based
editors, and other systems which can
take advantage of knowing the content
of recorded video. The logger treats
video as a stream with temporally
bounded events rather than as a set of
clips with attached keywords; this
allows the system to automatically
"cut" the video to its own purposes.
Video annotations are represented
graphically to enhance data
visualization and to enable logs to be
shared among human and machine users;
in addition, palettes of commonly used
sets of iconic annotations streamline
the logging of segments similar to
segments seen before. The indexing of
both the video itself (whose images
are stored digitally) and of the icon
palettes connects to the facilities of
a memory-based representation in the
background.
5. Storyteller Systems
(Professor Kenneth Haase and Professor
Glorianna Davenport)
Storyteller systems are sophisticated
programs with deep and detailed
knowledge of some particular domain or
domains and access to "media
resources" - recorded video, sound,
and text - regarding the domain. By
combining these resources with
synthesized graphical and textual
representations, a storyteller system
produces a story customized to what it
knows -and what it learns - of a
listener's background, preferences,
and interests. These stories emerge
dynamically as the system interacts
with the user; questions and
criticisms yield wholly new sequences
of video, sound, and explanation in
reply. Such systems transform the
character of publication: rather than
producing epistles, one produces
emissaries.
'6. FRAMER: Knowledge Description and
Sharing'
(Professor Kenneth Haase)
FRAMER is a portable library for
knowledge representation and inference
being used in a variety of projects
around the Lab. FRAMER provides a
persistent object-oriented database
with a simple inheritance mechanism
and an embedded extension language
(FRAXL) based on SCHEME. FRAMER data
structures are easily shared between
different hardware platforms
(workstations, Macintoshes, PCs) and
software platforms (C and LISP).
Current work on FRAMER includes the
development of a portable user
interface API for FRAXL, a networked
implementation supporting the
distribution of programs and data, and
integration ongoing analogical
representation work with FRAMER.
FRAMER is currently being used in a
number of projects throughout the Lab.
7. Graphics by Example
(Henry Lieberman)
Experts in visual domains such as
graphic design are fluent in the
generation and critique of visual
examples. We are combining
representation and learning techniques
from artificial intelligence with
interactive graphical editors to
create a "programming by example"
system to assist designers in
automating graphical procedures.
8. Graphics for Software Visualization
(Henry Lieberman)
This project explores how modern
computer graphic imagery can be used
as a tool to help programmers
visualize software. We are
implementing a range of experimental
debugging systems that use color,
animated typography, and three-
dimensional visual representation of
programs.
9. The Berlin Wall of Programming
(Henry Lieberman)
The increasing demand for graphical
workstations creates a schism between
fast languages, such as C, and
prototyping languages, such as LISP,
in the UNIX environment. We are
researching methods of overcoming this
split in order to integrate AI with
graphics in real time.
10. Intelligent Technical
Documentation
(Henry Lieberman)
Technical documentation for hardware
and software is expensive to produce,
often inaccurate and inadequate. We
are exploring a new approach to
producing technical documentation in
which an expert interacts with a
simulation of a device, and the system
automatically produces both English
descriptions and visual illustrations.
11. Graphical Annotation
(Henry Lieberman)
People often communicate important
knowledge by drawing and labeling
diagrams. Why can't we communicate
knowledge to a machine by using
graphical indications of parts and
structure rather than by textual
databases or programming languages? We
are using computer-readable graphical
annotation of images in a direct-
manipulation editor to communicate
relations that tell the system how to
interpret and generalize user actions.
We are also exploring voice input so
that the user can explain actions to
the machine as they are being
performed.
12. Instructible Agents
(Henry Lieberman)
Agent software can perform tasks
automatically on behalf of a user, but
how does the agent come to learn what
the user wants? Sometimes the agent
can learn just by observing user
behavior, but there may also need to
be interaction where the user
instructs the agent more explicitly.
The instructibility aspect is the
focus of this project. The user may
present examples of behavior that the
agent should follow and give advice to
the agent as to how the examples
should be interpreted. The agent must
give feedback to the user so that the
user understands what the agent knows
and is capable of doing. Multimodal
interaction is important in both the
instruction and feedback.
13. Agent-Application Communication
(Henry Lieberman)
Current experiments in agent software
rely mostly on domain-specific
applications that have been programmed
from scratch or explicitly modified in
mind. Is it possible to make a toolkit
or protocol that would allow an agent
to communicate and control
applications that have been
constructed more conventionally? Can
the agent "take the place" of the user
in the interface? Can the agent have
access to the application's data and
behavior? Will commercial "inter-
application communication" mechanisms
suffice? What is the division of labor
between the agent and the application?
14. Autonomous Agents
(Professor Pattie Maes)
This project applies artificial
intelligence techniques to the field
of human-computer interaction. In
particular, techniques and systems
developed in the area of autonomous
agents and the area of commonsense
representation are combined to
implement "interface agents":
interfaces that provide expert
assistance to a person engaged in the
use of a particular computer
application. Interface agents differ
from current day interfaces in that
they are more autonomous (performing
many of the time-consuming, more
mundane tasks the user normally would
have to perform), more intelligent
(learning from the user by observation
and querying), and more personalized
(customizing according to the user's
goals, needs, preferences, habits, and
history of interaction with the
system). The project focuses on how
interface agents can acquire their
competence using machine-learning
techniques.
15. Interface Agents
(Professor Pattie Maes)
This project applies artificial
intelligence techniques to the field
of human-computer interaction. In
particular, techniques and systems
developed in the area of autonomous
agents and the area of commonsense
representation are combined to
implement "interface agents":
interfaces that provide expert
assistance to a person engaged in the
use of a particular computer
application. Interface agents differ
from current day interfaces in that
they are more autonomous (performing
many of the time-consuming, more
mundane tasks, the user normally would
have to perform), more intelligent
(learning from the user by observation
and querying) and more personalized
(customizing according to the user's
goals, needs, preferences, habits, and
history of interaction with the
system).
16. Editors, Agents, and Butlers
(Professor Pattie Maes)
This project attempts to deal with the
problem of news information overload.
We are building "interface agents" for
news filtering. These are semi-
intelligent computer systems that make
personalized suggestions to a user for
news items (text, video, audio). The
user is able to browse through the
news available (as is the case with
current interfaces), but some of the
news items will have been
"highlighted" while other items might
have been left out by the agents.
These agents learn news items in which
the user might be interested in three
different ways. First, the user is
able at all times to instruct an agent
about which news items the user wants
to receive or not receive. Second, the
user is given the option of providing
feedback to the agent about how much
certain news items are liked or
disliked. These feedback data are used
by the agent to discover regularities
in the user's news interests in terms
of the content of the article, as well
as other features such as the author,
urgency, and news source. Third, these
feedback data are used to detect
similarities between different users
and to discover "clusters" of users
with similar news interests (on a
given news topic). Once such clusters
have been detected, news items that
one or more users liked are suggested
by the agent to a user with similar
interests.
17. Society of Mind
(Professor Marvin Minsky)
Professor Minsky continues to develop
the theory of human thinking and
learning called the "Society of Mind."
This theory explores how phenomena of
mind emerge from the interaction of
many disparate agencies, each mindless
by itself. For example, one aspect of
the theory explains reasoning by
analogy on the basis of transforming
between different kinds of knowledge
representations. Another aspect is a
"re-duplication" account of natural
language, in which grammatical forms
are seen as emerging directly from
expressive requirements of
communication between different
mechanisms inside the brain, rather
than from conventions that
communications between people are
forced to fit. Professor Minsky has a
continuing interest in the limits and
potentials of "connectionist learning
systems" and their role in distributed
cognitive accounts like the Society of
Mind. He is actively considering how
such systems may be combined and
interconnected in a way that avoids
the serious scaling problems of
unstructured connectionist systems.
18. Animal Construction Kits
(Professor Marvin Minsky)
This is a project whose context is the
simulation of animal behavior, with
goals of developing computational
models for ethology, investigating
situated action approaches to
artificial intelligence. A related
goal is the development of
environments for facilitating such
projects.
19. Structure out of Sound
(Professor Marvin Minsky, Andrew
Lippman, and Michael Hawley)
In an information-rich environment
where data, images, and sound are
readily accessible and digitally
communicated, the issue of content-
based search becomes a necessity.
Structure out of Sound is the first
attempt at a unified analysis tool for
speech, music, and sound effects.
Movies are analyzed into sonic
primitives that allow one to divide a
movie into dialogue and action or to
identify the presence of a single
actor. The initial work, a doctoral
thesis, lays out the groundwork for
later addition of visual browsing and
correlating elements.
20. Constructionism
(Professor Seymour Papert, Professor
Edith Ackermann, and Professor Mitchel
Resnick)
We are developing "constructionism" as
a theory of learning and education.
Constructionism is based on two
different senses of "construction." It
is grounded in the idea that people
learn by actively constructing new
knowledge, not by having information
"poured" into their heads. Moreover,
constructionism asserts that people
learn with particular effectiveness
when they are engaged in
"constructing" personally meaningful
things (such as stories, animations,
or robots).
21. Robot Design Competitions
(Professor Seymour Papert and
Professor Mitchel Resnick)
We have helped develop an intensive,
one-month robot design course for MIT
undergraduates. In the course,
students design and build robots made
from electronic and LEGO parts, then
pit the robots against one another in
elimination-style competition. The
Robot Design Competition is a living
laboratory for the constructionist
theory of learning, and a vehicle for
exploring the role of design
activities in education. In the
future, we plan to organize similar
activities for precollege students,
using our new "Programmable Brick"
technology.
22. Project Headlight
(Professor Seymour Papert)
Eight years ago, we began a
partnership with the Hennigan School,
a multicultural public elementary
school in Boston. At the school, we
have helped develop a technology-rich
environment, with more than 100
personal computers for 200 students.
We have worked with teachers and
students to explore new approaches to
education and new uses of technology
in education.
23. Learning in Multicultural Settings
(Professor Seymour Papert and
Professor Edith Ackermann)
For several years, we have focused on
issues related to gender, race,
culture, and cognitive styles. One
setting for this research is Paige
Academy, a small, independent
Afrocentric school in the Roxbury
section of Boston. This setting
provides an organizationally and
culturally different context for the
development of new ideas about
learning.
24. Science and Whole Learning
Teachers' Collaborative
(Professor Seymour Papert)
We maintain a working relationship
with a network of teachers from
different schools (mostly in the
Boston area, but also some in other
parts of the country). Through this
network, we have collaborated with
teachers in developing concepts for
workshops, seminars, and other
activities to foster their
professional development.
25. Electronic Communication
(Professor Seymour Papert and
Professor Mitchel Resnick)
We maintain a telecommunications
network through which collaborating
teachers and schools can maintain
contact with the group and with one
another. Elementary-school students
also use the network. In one project,
bilingual students in Boston are
communicating with students in Costa
Rica.
26. Children as Designers
(Professor Seymour Papert and
Professor Edith Ackermann)
We are studying how children can
change from "consumers" into
"designers" of computer-based
multimedia productions. In one
project, elementary-school students
are designing their own computer games
- and, in the process, learning about
programming, mathematics,
collaboration, and design. The project
is an extension of earlier research in
which children designed instructional
software to help other students learn
about fractions.
27. Games
(Professor Seymour Papert and
Professor Mitchel Resnick)
The idea of playful learning is
pervasive in all of our activities.
Specific game-oriented research
include studying children's attachment
to video games, studying the informal
learning process through which
children master new games, and
studying children as designers and
implementers of their own games.
28. Study of Mathematical Thinking
(Professor Seymour Papert)
The theme of studying mathematical
thinking pervades many projects. A
specific project in this category is a
study of probabilistic thinking in
children and adults.
29. Thinking and Learning about
Systems
(Professor Mitchel Resnick)
We are studying how students think
about "systems concepts" (such as
feedback, self-organization, and
evolution), and how to make these
ideas more accessible to young
children. As part of this effort, we
have developed an extended version of
Logo with thousands of interacting
graphic turtles, which students can
use to explore ideas about self-
organizing and decentralized systems
(such as ant colonies and traffic
jams).
30. Ubiquitous Computing for Kids
(Professor Mitchel Resnick)
We are extending the notion of the
child's construction kit, adding
computational elements to the bin of
building parts, so that children can
embed computational power in the
machines they build, and spread
computation throughout their world.
This idea is part of a more general
movement toward "ubiquitous computing"
- the incorporation of computational
elements into the everyday objects. As
part of this effort, we are developing
a "Programmable Brick" - a LEGO brick
(the size of a deck of cards) with a
computer inside.
31. New Visions of Programming in
Education
(Professor Mitchel Resnick)
We are introducing new "programming
paradigms" into educational computing
- for example, adding multiprocessing
capabilities to the Logo programming
language. These new paradigms not only
extend the types of projects that
children can work on (for example,
making it much easier for children to
create their own video games), they
also help children develop new ways of
thinking about certain mathematical
and scientific concepts.
32. Learning in Virtual Communities
(Professor Mitchel Resnick)
Imagine students from many different
schools, each connected (via the
Internet) to the same "virtual world."
Students can "walk" around the world,
and meet and talk with other students.
Perhaps one "room" in the world is
dedicated to discussions about
environmental issues. The world is
also extensible: students can create
and program new "objects" and new
"rooms." We are creating such on-line
worlds (known generically as "MUDs")
as a context for students to become
meaningfully engaged in reading,
writing, and programming.
II. PERCEPTUAL COMPUTING
33. Mid-Level Vision
(Professor Edward Adelson)
We are developing early and mid-level
vision mechanisms that emulate the
processing that occurs in primate
visual cortex and are designing
algorithms that apply them with high
computational efficiency. The
mechanisms are useful for edge
detection, texture analysis, motion
analysis, and image enhancement.
34. X-Y-T Image Analysis
(Professor Edward Adelson and
Professor Aaron Bobick)
We treat a sequence of images as a
three-dimensional volume, with the
dimensions of x, y, and t (time).
Motion analysis involves orientation-
selective filtering within this
volume. We are developing techniques
for dealing with difficult situations
such as motion occlusion and motion
transparency.
35. Analysis of Egomotion Using Wide
Angle Vision
(Professor Aaron Bobick)
A critical problem in computer vision
is determining the motion of the
camera through a scene (egomotion). We
are developing techniques for using
stereo, wide-angle imagery data to
give a better egomotion estimate than
monocular sequences of images, and in
a way that is much simpler than
previous approaches.
36. Modeling and Tracking People
(Professor Aaron Bobick)
The ability to track people in
imagery, and determine their positions
and pose, is critical for many machine
interface and telecommunications
technologies. The goal of this
research is to use generic models of
people along with known information
about the environment to maintain an
accurate geometric model of the
people. Doing this requires
intelligent reasoning about multiple
views and occlusion.
37. Dynamic Scene Annotation
(Professor Aaron Bobick)
In a dynamic scene, what is in the
image is less important than what is
happening in the scene. We are
developing dynamic description
mechanisms capable of extracting the
important aspects of the behavior or
motion present in a scene. Two domains
we are exploring are charting football
plays and extracting choreography from
a ballet sequence.
38. Multimodal Natural Dialog
(Dr. Richard A. Bolt)
People in each other's presence
communicate via speech, gesture, and
gaze. The aim of this research is to
make it possible for people to
communicate with computers in
essentially the same way. This
research explores combined speech,
free-hand manual gesture, and gaze as
input modes to the computer. One side
of this effort is adapting
technologies to capture inputs from
the user: a speech recognizer,
gesture-sensing gloves, and a head-
mounted eye-tracking system. These
technologies are off-the-shelf, and as
more efficient, less obtrusive
technologies emerge they will be
assimilated into the work. The other
side of the effort involves the
creation and elaboration of the
software intelligence to interpret
input from speech, hands, and eyes,
and to map to an appropriate response
in graphics and speech or nonspeech
sound.
The main expected outcome from this
research is that computer-naive people
(read: most of the world) will be
able to use everyday social and
linguistic skills to access computers
and computer-based media.
39. Advanced Interactive Mapping
Displays
(Dr. Richard A. Bolt, Professor Muriel
R. Cooper, and Ronald MacNeil)
This topic represents a three-year
project involving:
*Development of graphically intelligent
tools and principles to support the
interactive creation of symbolic
information landscapes.
*Integration of such landscapes with
pictorially convincing virtual
environments.
*Enabling of multimodal natural
language communication with the
virtual environment display and its
contents via combinations of speech,
manual gesture, and gaze.
These three streams of investigation
are to converge in year three of the
overall project in the context of an
ultra-high-definition, seamlessly
tiled wall-sized display (DataWall).
40. Information Appliances
(Michael Hawley)
Tools and appliances of all sorts,
from wristwatches and notebooks to
concert grand pianos and home
entertainment systems, are sprouting
digital components. To interoperate
harmoniously, and to ease the personal
interface to a global information
system, appliances need to communicate
with each other. This project studies
the languages and systems required for
an open and scalable architecture.
41. Structure out of Sound
(Michael Hawley, Professor Marvin
Minsky, and Andrew Lippman)
In an information-rich environment
where data, images, and sound are
readily accessible and digitally
communicated, the issue of content-
based search becomes a necessity.
Structure out of Sound is the first
attempt at a unified analysis tool for
speech, music, and sound effects.
Movies are analyzed into sonic
primitives that allow one to divide a
movie into dialogue and action or to
identify the presence of a single
actor. The initial work, a doctoral
thesis, lays out the groundwork for
later addition of visual browsing and
correlating elements.
42. Looking at People
(Professor Alex Pentland)
This large, multiyear research project
called "Looking at People" is composed
of several different subprojects
including real-time tracking people's
body positions as they point and move
about in the work environment, gesture
and expression recognition, and
continued development of our real-time
face recognition system. Currently
there are two "test bed" applications
of this technology: a real-time
virtual reality system called ALIVE
(with Professor Pattie Maes) and a
"smart" teleconferencing system.
43. Model-Based Image Coding
(Professor Alex Pentland)
This research project is developing
generic, physically based models that
allow ultra-low bandwidth image
compression. Using such models we can
concisely describe an object's
appearance, and predict how its
appearance will change as the object
and camera move. Using these
techniques we have been able to
achieve high-quality still-image
compression with 50:1 to 100:1
compression ratios, and high-quality
video compression at only 8
kilobits/second.
'44. Video Databases: Indexing by
Content'
(Professor Alex Pentland)
One of the most significant problems
with multimedia technology is that you
can't find what you want. This is
because, unlike text-only systems, you
can't ask a computer about the
contents of images or video. For
instance, you can't ask the computer
to "find another video clip like this
one, but shot from another angle," or
"find a video clip of me on the
beach." We are working to solve these
problems by making computers able to
"see" the contents of images and
video.
45. Image Query by Texture Content
(Professor Rosalind W. Picard)
People can quickly scan a lot of
pictures and identify a particular
pattern in a still image or video
sequence. Machines currently cannot.
We are studying how humans recognize
visual patterns, and we are building
computer models to mimic this
behavior. Particular attention is
given to how humans classify patterns
and interpret directionality,
contrast, periodicity, randomness,
translation, rotation, perspective,
and scale.
46. Nonlinear Space-Time Texture
Models
(Professor Rosalind W. Picard)
A bicyclist's pedaling may be
identified as a periodic texture in
time. Ravaging flames or turbulent
water can each be thought of as a
stochastic texture in space and time.
We are developing nonlinear models for
spatio-temporal patterns that don't
adhere to the "rigid body, affine
motion" assumption. Models currently
under exploration include physical
models of turbulence and biologically
motivated reaction-diffusion systems.
We have also been developing general
methods for nonlinear optimization;
these have many applications such as
recognition of nonlinear patterns.
47. Semantic Image Modeling
(Professor Rosalind W. Picard)
If I state "Atlanta is in Cincinnati"
today, it is unlikely you will think I
am coherent. If, however, we are
talking baseball, then the sentence is
very clear. The context makes the
interpretation not only easier, but
possible. Similarly, with pictures, if
you see blue at the top then it's
probably sky. The goal of this work is
to begin setting up two-way
interaction between available
contextual information and the models
used to represent visual information.
The ultimate goal is the one Shannon
missed - putting semantic meaning into
"information" theory.
48. Computers and Telephony
(Christopher M. Schmandt)
Computer workstations can provide a
much needed user interface to advanced
telephony functions, provided a path
exists between the workstation and
switch. Controlling call set-up from a
user's workstation allows a greater
degree of personalization and dynamic
call handling, both for outgoing and
incoming calls. This project is being
implemented in the ISDN environment of
MIT's campus telephone network, using
Phoneserver, a computer network
interface to Basic Rate ISDN
switching.
49. Desktop Audio
(Christopher M. Schmandt)
This project explores software
architectures and user interfaces to
voice as a computer data type as well
as a command channel. Its goal is to
make speech ubiquitous to a range of
applications, for instance, editing a
telephone message to include
annotation of a text document. Related
issues include object-oriented
manipulation of multiple media
"selection" (or "clipboard") data
between processes.
50. Voice Interfaces to Hand-Held
Computers
(Christopher M. Schmandt)
This project is using a mock-up to
explore user interfaces and
applications of voice in a hand-held
computer. The target is a machine, the
size of a microcassette recorder,
which is simply a mobile extension of
a more powerful desktop computer.
Applications include note-taking,
outlining, and a memory assistant.
51. Voice Hypermedia
(Christopher M. Schmandt)
The project takes the traditional
"hypertext" approach to a voice-only
environment. Text is replace by
recorded voice segments, and the user
interface consists of a speech
recognizer and speech synthesizer. A
related issue is automatic
segmentation of recorded speech
segments into semantically meaningful
chunks.
52. Telephone-Based Voice Services
(Christopher M. Schmandt)
This project explores the utility of
voice in a range of applications
offering services to users of the
telephone network. Topics being
examined include voice mail, speech
synthesis of electronic mail, access
to calendars and rolodexes, and speech-
based user interface to call
processing features such as variable
call forwarding. Visual (on the
workstation) and speech (over the
telephone) based applications offer
differing views of the same underlying
databases in an office environment.
53. Synthetic Performers
(Professor Barry Vercoe)
We have shown that computers can
exhibit real-time musical behavior
similar to that of skilled human
performers. Our live violinist
accompanied by a computer-driven piano
has been widely viewed on public TV.
This research continues to explore the
music-cognitive issues that arise when
a computer is put in the position of
real-time, highly sensitive human
interaction.
54. Synthetic Listeners
(Professor Barry Vercoe)
This project is researching audio
signal separation, with a focus on
polyphonic pitch detection. We want to
understand how humans do multisource
audio separation with ease (the
"cocktail party conversation" trick),
and why machines cannot. We are
developing a representation of sound
using recent concepts of human
auditory encoding, so that machines
might perceive complex audio signals
the way humans do.
55. Synthetic Spaces
(Professor Barry Vercoe)
Research is being conducted on
electronic enhancement of a room's
natural ambience via an active
boundary system of microphones and
speakers. The technique utilizes a new
class of flat reverberators running on
a high-speed digital audio processor.
Our goal is to separate acoustics from
architecture within rooms and public
spaces.
56. Cognitive Audio Processing
(Professor Barry Vercoe)
This project is investigating how
humans perceive and quantify music and
audio information in cultural
contexts. This involves computer-
assisted understanding of source
identification, voice intonation,
rhythmic and tonal structure, and
emotional content, within both Western
and non-Western traditions.
57. Structured Audio Transmission
(Professor Barry Vercoe)
We are researching the flexible
encoding of speech, music, and
ambience (partially rendered),
suitable for rate-varying packet
transmission over a multiplexed
audio/video channel. We are also
studying receiver decoding, channel
assignment and rendering, according to
the level of local resources, which
are also self-calibrating and
adaptive.
III. INFORMATION & ENTERTAINMENT
58. Salient Stills
(Walter Bender)
A Salient Still is a 1500-line, print-
quality photograph created from a
video sequence. It can carry both the
context and the detailed content of
the sequence. The data representation
consists of video pans, tilts, and
zooms warped into a continuous
space/time volume. A high-resolution,
panoramic still image is extracted
from this representation. This still
image has both the wide field of view
captured by the short focal-length
frames and the detail captured by the
long focal-length frames.
59. Color Semantics
(Walter Bender)
We are exploring the role of color
alignment in the preservation of the
experience of color. Central to this
investigation is the formulation of
color alignment and its measurement.
Objective quantification of color
relatedness is desirable, since it
allows precise specification of color
in relations to its surrounding visual
context and state of visual
adaptation. A secondary theme of this
research is the role color alignment
plays in the generation of expressive
energy in color combinations.
Expressive load of color combinations
can be predicted, based on selection
of color alignments. We are applying
this work to the measure of degree-of-
alignment between window and
background in a workstation. This work
will provide guidelines for effective
selection of window, font, and
background colors for any given
application.
60. Knowing the Individual
(Walter Bender)
Just as a display should "know" the
data, it should also be cognizant of
the user. The more the system knows
about the user, the better able it
will be to make sense of the
ambiguities and inconsistencies
inherent in human communication. Our
work in user modeling involves the
full exploitation of the user's
computational environment, so that
information normally provided by the
computer (e.g., idle time, schedule
information, electronic mail
subscriptions) and other, more
esoteric information (e.g., physical
location tracking systems, eye-
tracking systems, speech manipulation,
electronic newspapers, model-building
cameras) can be integrated to
construct dynamic, individual user
models that both change over time as
users change and as the system learns
more about users.
61. Interactive Computation of
Holographic Images
(Professor Stephen A. Benton)
The display of holographic 3-D images
requires many megabytes of data to be
recomputed every time the image is
changed. These calculations simulate
the propagation and interference of
light beams, but numerical shortcuts
and other new techniques have reduced
the computation times by more than
twenty times to well under one second,
allowing truly interactive
manipulation and exploration of
complex 3-D image data.
62. Scaled-Up Holographic Video
(Professor Stephen A. Benton)
The world's first electronic
holographic video display has
established the principles of
information reduction and image
scanning, but scaling up to practical
display sizes has posed significant
electronic and electro-optical
challenges. The parallelization of the
computation, storage, and display has
been shown feasible for 3" x 5"
images, laying the groundwork for
further scale-ups of image size.
63. Holographic Laser Printer
(Professor Stephen A. Benton and
Michael A. Klug)
Full-color, wide-angle, and large-size
computer-generated hard-copy holograms
still take considerable time to
create. A "holographic laser printer"
allows simpler hard-copy holograms to
be generated in minutes instead of
hours, automatically and without wet
processing. Research topics include
recording materials and processing,
optical design, image processing and
LCD display, and optical techniques
for image noise reduction.
64. Immersive Projected-Image
Holographic Displays
(Professor Stephen A. Benton)
The creation of meter-sized
holographic 3-D images can be achieved
with large-area holograms, or via the
projection of images from smaller
holograms into wraparound optical
systems. Here we explore the
distortions and properties of deeply
concave mirrors used as projection
elements.
65. Medical Image Holography
(Professor Stephen A. Benton)
MRI and CAT-scan cameras gather three-
dimensional data, but holography
offers the only way of examining those
images in fully three-dimensional
form. This project explores new image-
processing, editing, and rendering
tools that are needed to make these
complex 3-D images quickly and
accurately interpretable by
physicians.
66. Edge-Lit Holograms
(Professor Stephen A. Benton)
Conventional holograms require
illuminators to be mounted on walls or
ceilings near the hologram; edge-lit
holograms are a new type of white-
light hologram that allow the light
source to be included within the mount
itself, assuring a compact and
carefully aligned illumination. This
project explores the fundamental
diffraction and imaging properties of
these holograms with a view toward
making their images deeper, brighter,
and clearer.
67. Open Architecture Television
(Professor V. Michael Bove)
Open Architecture Television explores
the encoding of digital video in such
a way that the parameters of
production (resolution, frame rate)
may be decoupled from those of the
display, supporting a broad variety of
production and display systems and
permitting easy international
interchange as well as interworking
between television and computer
equipment. We have successfully
demonstrated this idea using
spatiotemporal subband coding, and
also have developed frame-rate
decoupling methods appropriate for
motion-compensated coders such as
MPEG.
'68. Cheops: Data-Flow Television
Receiver'
(Professor V. Michael Bove)
The Cheops Imaging System is a
compact, modular platform for
acquisition, real-time processing, and
display of digital video sequences and
model-based representations of moving
scenes. It is intended as both a
laboratory tool and a prototype
hardware and software architecture for
future programmable video decoders.
Rather than using a large number of
general-purpose processors and
dividing up image processing tasks
spatially, Cheops abstracts out a set
of basic, computationally intensive
stream operations that may be
performed in parallel and embodies
them in specialized hardware. Eight
systems have been built and are in use
at the Media Lab and at various
sponsor sites.
69. Motion Modeling for Video Coding
(Professor V. Michael Bove)
Most digital video-coding methods use
a very simple approximation to scene
motion that breaks up images into
arrays of square tiles and assigns a
two-dimensional motion vector to each.
We are developing video-coding methods
that segment scenes into coherently
moving regions and compute more
accurate motions for the regions. The
result should be a more compact
representation, better scene
understanding, and the ability to
compute images for arbitrary instants
in time (in connection with Open
Architecture Television research).
70. Production, Distribution, and
Viewing of Structured Video Narratives
(Professor V. Michael Bove and
Professor Glorianna Davenport)
Research in video coding at the Media
Lab increasingly emphasizes structure
as a means of leveraging both
compression and story. Image
understanding, machine vision, and a
priori knowledge are used to produce
video representations in terms of
component parts (actors, backgrounds,
moving objects) and to produce content
annotations for story construction.
This form of coding has implications
for production, postproduction,
distribution, and viewing. The goal of
this project is to script, produce,
and work with a story represented as a
structured video database in order to
examine diverse issues including
script annotation and storyboarding,
camera design, production techniques,
data formatting, and viewing
paradigms.
71. Multimedia Testbed
(Professor Muriel R. Cooper, Ronald
MacNeil, and David Small)
The Meta-Media project integrates a
rich set of graphic tools and editors
with searching, browsing, linking,
scripting, and visualization
capabilities to allow research into
the new design issues emerging from
real-time, multilayered information in
an electronic communication
environment. The planning of
structured and unstructured
informational multimedia pathways
presents graphical design complexity
and challenge for both the designer
and the user of multimedia
information. Traditional media
designers from the print, audiovisual,
and animation worlds provide important
insights into guiding viewers'
perceptual responses to information.
Work that bridges the gap between the
hands-on world of designers and the
more abstract symbolic world of
programming explores spatial,
temporal, and relational rules and
methods which rank information for the
viewer, influence emotional responses,
and often embody hidden aesthetics.
Automatic layout and design
intelligence will be required to
filter data for users in every field.
Work is done in a sophisticated
hardware and software environment
which includes our own window manager.
72. Computationally Expressive Tools
(Professor Muriel R. Cooper, Ronald
MacNeil, and David Small)
We are developing a repertoire of
graphics that will allow computational
assistance in the expression of
dynamic and interactive design. In an
electronic information environment we
need new graphical principles, tools,
and editors which are suitable to the
integrated, interactive, dynamic, and
intelligent formation and presentation
of information. This graphical set
must be integrated with real-time
design-assistance systems in order to
cope with the magnitude of visual
complexity resulting from multiple
streams and forms of data that deluge
the user.
*Computational Graphics: Animation is
currently produced either by labor
intensive cel animation, based on
expressive individual creativity, or
by traditional computer graphics
animation based on modeling of
physical behavior. While work in the
direction of coupling knowledge-based
animation is very young, we are
exploring ways of modeling and
animating data information as a set of
interactive tools - data as graphics/
behavior of information.
*Data-Driven Graphics: Data
visualization is the symbolic
counterpart of scientific
visualization in which we will build
transpositional models that will allow
various forms of on-the-fly
abstractions from real-time data
domains such as maps, weather, and
actuarial information.
*Behavioral Graphics: Information that
responds dynamically and interactively
to change based on physical models
drawn from work in scientific
visualization holds great promise. Our
work in responsive substrata that
allow the user to model paper fibers,
pigment, diffusion, and gravity will
be extended into informational models
that, for example, would graphically
indicate age or accuracy of data.
Further modeling of mark-making tools
and force feedback is planned.
*Animation: A cel-based animation
system with many unique capabilities
is the foundation for further
animation research. The integration of
hand-drawn animation with 3-D modeling
continues to be a research subject.
Work in moving back and forth from 2-D
to 3-D continues, as do investigations
into simple forms of automation.
*Sound-Graphics: This project explores
some of the unique and overlapping
characteristics of image and sound. In
Tone of Voice Typography, the color,
size, translucency, style, and even
meaning of a word may be driven by the
pitch of a sound over time. Recent
work includes sound at the interface,
sound/graphic objects, spatial sound,
and compositional and analytical tools
*Adaptive Typography and Graphics: This
project is developing ways of filtering
typography and graphics on the fly for
greater legibility and maintaining the
perception of consistent color in an
unpredictable, changing environment.
These principles are being incorporated
with dynamics and intelligence, and
extended to include more complex
graphics.
*Topographical Typography: The goal of
this project is to develop dynamic
maps, typography, and graphics which
have knowledge of each other, and to
develop intelligent tools that allow
the effective design of graphical
behavior in relation to real-time
dynamic data.
*Visual Complexity and Selective
Filtering: Using gaussian filters and
pyramid coding, translucency, blur,
and multiple layers of landsat and
weather data, we are able to
selectively address aspects of complex
information in real time for task-
based information. Future work
includes making object-based elements,
local changes, zooming, two and one-
half and three-dimensional views, and
transitional changes.
*Configurable Interface Design: Ways
of interacting with these systems
graphically require new paradigms
beyond the desktop and window
metaphors. Integration of expressive
tools and graphical intelligence in a
multimedia environment will enhance
current work in graphical interfaces
that can adapt to task specifics and
personal preferences.
*Browsing and Navigation: Traversing
and navigating complex information
effectively requires new graphical
models which allow the user to
maintain context while exploring
multiple levels of information
simultaneously. The infinite zoom will
allow us to do nodal zooming while
maintaining graphical context in very
large informational databases.
73. Large-Scale, High-Resolution
Display Prototypes
(Professor Muriel R. Cooper, Ronald
MacNeil, and David Small)
Our prototype of a 2,000 by 6,000 line
display provides us with a testbed for
investigating the integration of
graphical presentation and
intelligence in interactive and
dynamic form. Integration of many of
our multimedia capabilities is
underway. The prototype is connected
to the Connection Machine and will
soon be connected to a fiber-optic
cable which will allow us to explore
collaborative and remote communication
and the implications of space on
information creation and management. A
prototype for an 8 x 10 flat panel
display is planned.
74. Input/Output Considerations
(Professor Muriel R. Cooper, Ronald
MacNeil, and David Small)
Hardcopy output will continue to play
a major role in the information
medium, and we will need intelligent
layout systems to transcode work areas
and sessions into appropriate layout
on paper. Work has just begun on this
aspect of the research.
75. Advanced Interactive Mapping
Displays
(Professor Muriel R. Cooper, Dr.
Richard A. Bolt, and Ronald MacNeil)
This topic represents a three-year
project involving:
*Development of graphically intelligent
tools and principles to support the
interactive creation of symbolic
information landscapes.
*Integration of such landscapes with
pictorially convincing virtual
environments.
*Enabling of multimodal natural
language communication with the
virtual environment display and its
contents via combinations of speech,
manual gesture, and gaze.
These three streams of investigation
are to converge in year three of the
overall project in the context of an
ultra-high-definition, seamlessly
tiled wall-sized display (DataWall).
76. Experiments in Elastic Media
(Professor Glorianna Davenport)
We define "Elastic Media" to be a user-
directed form of media storytelling in
which the computer mediates between
the user and chunks of content.
Content prototypes are developed to
demonstrate relationships between
content, form, modes of interaction,
and computational substructures.
Issues include research and production
of content segments and meaningful
machine-based orchestration of these
segments, based on user input. Current
projects include:
*Elastic Boston 2: A new content-based
project which focuses on the
intersection of a documentary style
guide to an urban venue and a shared
communication network. The project
will focus on downtown Boston, an area
from the Causeway to South Station,
including Faneuil Hall and the North
End. The system will offer
personalized local news, in-depth
reporting, community portraits, and
advertising. The application will
invite localized, shared exchanges
concerning their impressions,
memories, and activities between
community members.
*Movie-Maze: A virtual world has been
created for browsing movie trailers.
The world can be thought of as a 3-D
graphical mud in which users can
communicate with each other while they
are exploring the world and the movies
it contains.
*New Orleans Interactive (HyperCard
implementation): This project
explores structural issues related to
the design of complex documentary
narratives for education.
*Video Postcards: These are a semi-
structured form of personal
communication. Electronic postcard
formats should support inclusion of
low bandwidth movies suitable for tele-
network transmission.
*Wheel of Life: This project
represents multimedia which has
escaped the bounds of the box; this
project raises interesting issues
about interactive spaces and
collaborative discovery. This research
is particularly relevant for museum
exhibit design, theme parks, and
electronic performance spaces.
'77. Video Editing: Computational
Partnerships'
(Professor Glorianna Davenport)
Movie editing is extremely time-
consuming, so time-consuming, in fact,
that few home movies are ever edited.
The connection between video as
information and video as story will
become increasingly critical as
digital transmission of video from
remote visual databases becomes
viable. The goal of this work is to
integrate the moviemaker's knowledge
of content and craft into software in
order to model more robust human-
machine partnerships for video
storytelling. Systems include logging,
sequencing, and editing modules.
*Stratification: This research in
video description incorporates our
understanding of how the camera
mediates the environment while
recording content. The logging
environment is stream-based. The
browsing interface emphasizes
scalability of description hierarchy
and a graphical continuum. Both the
annotation and sequencing tools are
linked to Framer to allow maximum
interchange between machine-based
annotation algorithms, human
annotation, and storytelling
structures. The interface will be
expanded to include the creation and
use of low-level and high-level
relationships found within the
content.
*Log Boy and Filter Girl: This work
focuses on programmatic storytelling.
The system encourages the filmmaker to
think about the multiple playouts of a
story during script development. After
describing the story purpose and
defining the character set, the
filmmaker defines axes of interaction
which will be allowed in the story
playout. These axes serve as an
organizing metaphor for script
expansion. The logging is defined as a
function of filtering and vice versa.
The logging process is dynamic,
graphical, and attribute-oriented. A
series of predefined filters can be
expanded by the user, based on
particular needs. Filters reflect axes
of interaction. Several filters are
generally cascaded to offer maximum
flexibility in shot selection.
*Video Streamer and Collage: This is a
two and one-half and dimensional
paradigm for browsing which includes
object-like graphical selection of
video from any source. The interface
focuses on multiple views of the video
and audio stream, including the edge
of the frame, the frame in relation to
other frames, and audio associated
with a given frame. The stream is
parsed algorithmically for shot
boundaries. Movie clips can be
selected and manipulated in a collage
space.
78. Stories with a Sense of Themselves
(Professor Glorianna Davenport)
Current research into multithreaded
stories and storytelling tools beg the
issue of the author with a deep sense
of commitment to the story which is
being told. This project seeks to
explore the relationship between
personalization of the story for the
viewer and tools which specify the
author's concepts and constructs.
*Digitally Orchestrated Micromovies:
For many applications, story filters
designed by the author will allow a
viewer to drive through a database of
micromovies. The filters can include
simple content relationships and
stylistic features. The method is
illustrated with several prototype
movies, including "Endless
Conversation," "Dial a News Summary,"
and "This Ad is for YOU."
*Multithreaded Narratives: This
project is a theoretical and practical
exploration into narrative structures.
*Semantic News Network: This project
is a look at how information services
might be structured to accommodate
thoughtful interactions.
'79. Directing Digital Video: New
Tools'
(Professor Glorianna Davenport)
As digital video comes into its own,
directors will need new tools to
preview and construct story elements
for multithreaded, interactive
scenarios.
*The Director's Eyeglass: This project
is a portable prototype which will
allow a director to preview digital
effects in the field.
*Coding Camera Motion and Field of
View: This project looks at the
mechanics for recording and using
information about the camera view to
link content segments.
*The Journalist's Conceptual Notepad:
This project looks at how the
journalist can create a rich, machine-
readable conceptual framework during
development of story concepts. The
project will encourage the
preservation of the journalists
framework during the reconstruction of
a story by personalizing agents.
80. Storyteller Systems
(Professor Glorianna Davenport and
Professor Kenneth Haase)
Storyteller systems are sophisticated
programs with deep and detailed
knowledge of some particular domain or
domains and access to "media
resources" - recorded video, sound,
and text - regarding the domain. By
combining these resources with
synthesized graphical and textual
representations, a storyteller system
produces a story customized to what it
knows -and what it learns - of a
listener's background, preferences,
and interests. These stories emerge
dynamically as the system interacts
with the user; questions and
criticisms yield wholly new sequences
of video, sound, and explanation in
reply. Such systems transform the
character of publication: rather than
producing epistles, one produces
emissaries.
81. Production, Distribution, and
Viewing of Structured Video Narratives
(Professor Glorianna Davenport and
Professor V. Michael Bove)
Research in video coding at the Media
Lab increasingly emphasizes structure
as a means of leveraging both
compression and story. Image
understanding, machine vision, and a
priori knowledge are used to produce
video representations in terms of
component parts (actors, backgrounds,
moving objects) and to produce content
annotations for story construction.
This form of coding has implications
for production, postproduction,
distribution, and viewing. The goal of
this project is to script, produce,
and work with a story represented as a
structured video database in order to
examine diverse issues including
script annotation and storyboarding,
camera design, production techniques,
data formatting, and viewing
paradigms.
82. Real-Time Modeling
(Professor Neil Gershenfeld)
As routinely accessible computers
begin to approach gigaflop speeds and
as data networks approach
gigabit/second bandwidths, it becomes
possible to interact in real time with
meaningful numerical models. We are
exploring this promise in the context
of musical instruments, both because
of its significance for their
evolution and because they provide an
extremely demanding environment that
requires the integration of multiple
degrees of freedom of real-time I/O
with state-of-the-art computational
processing. We will be doing
experiments to characterize the
physics of successful traditional
instrument designs, using these
experiments to guide the creation of
numerical representations (based on
both first-principles physical models
and on nonlinear time-series
analysis), and developing new
approaches to interface a player to
these models. The initial goal is to
capture the instrument's performance
from the perspective of the player
(i.e., pass a musical Turing test),
and the longer term goal is to move
beyond these traditional designs while
still maintaining their mature
richness and subtlety. It is
anticipated that the tools that are
developed for this will be applicable
to more general human-machine
interaction problems.
83. Interface Sensors and Transducers
(Professor Neil Gershenfeld)
Technological interfaces must sense
user activity on a wide range of
length scales, ranging from less than
a millimeter (stylus input), through
centimeters (gesture sensing) and
meters (local tracking), to kilometers
(navigation). Increasingly, these
measurements must be done in three
dimensions, must produce images as
well as measurements, and must
maintain the required spatial and
temporal resolution without
significantly encumbering the user.
Force must often be measured along
with position, and it may be desirable
to generate output force (tactile
feedback). Unfortunately, the poor
state of the available sensing and
transduction technology for these
problems has been a significant
constraint on the development of many
new applications. We are using a range
of experimental techniques to develop
the instrumentation for the
environment around information
processing systems. This includes
designing and applying new materials,
the use of lensless imaging, and the
active remote interrogation of passive
sensors.
84. Information, Computation, and
Physics
(Professor Neil Gershenfeld)
Information, as logical content,
necessarily has a physical reality.
Although these two levels of
description are usually entirely
distinct (the designer of a
conventional memory circuit does not
need to know what messages it will
store), there are exciting
possibilities and increasingly serious
constraints associated with their
interface in devices that store,
transmit, and manipulate information.
We are exploring this area in both
directions: using physical insights
to help solve engineering problems
(such as the use of dissipative
dynamic systems to satisfy
communication channel constraints) and
using engineering insights to help
understand physical systems (such as
applying ideas from information theory
to help understand complex physical
systems). A central theme is the
relationship between logical and
physical entropy; here we are studying
the use of active devices to bypass
conventional thermodynamic limits in
logic.
85. Incremental Coding
(Andrew B. Lippman)
High-quality compression is inherently
asymmetric - robust source processing
directly yields more efficient image
representations. Expecting that the
original material may be available
only once, this research is directed
at creating a compressed, intermediate
format that can be translated into
consumer distribution formats for any
rate from 1.5 megabits/second to
studio quality, using hardware no more
complex than a home decoder. A
corollary is real-time encoding that
is later asymmetrically processed (in
the background) to reduce the
immediately available digital
workprint to a distribution format.
86. Movies via Modems
(Andrew B. Lippman)
Ultralow bandwidth coding divides a
scene into background and dynamic
elements (objects) that can composite
any individual frame. An example is
telephonic movies where a library of
essential scene elements is
distributed in advance, but the cues
needed to assemble them into a movie
is sent at viewing time over normal
telephone lines. Alternatively, one
could store more than one episode of a
series on a single compact disc. This
"book of the month" movie system
allows periodic distribution of the
core parts of many movies on one
compact disc (or by downloading),
combined with real-time telephone
delivery of assembly rules.
87. Objective Coding
(Andrew B. Lippman)
Objective Coding generalizes early
work on Scene Widening (1992) to
analyze a picture sequence into
components separated by their
activity. The goal is similar to book-
of-the-month movies, but the
concentration is on scene analysis.
Objective coding uses panoramic
storage and compositing to construct
each frame of a sequence by warping
and shifting elements stored in
memory. In the current work, the basic
architectural elements of an MPEG
decoder is reconfigured so that its
internal memory contains enlarged
background and foreground objects
instead of adjacent frames.
88. Dimensionalization
(Andrew Lippman and Henry Holtzman)
Images from multiple still and cinema
cameras aimed at the same event are
merged into a four-dimensional (x, y,
z, t) visual database of the scene to
allow multiple perspectives,
relighting, and new picture content.
Ultimately, this approach might allow
the viewer to roam through the set,
taking the position of the camera
operators or anyplace in between.
Initial work addressed static scene
elements ("Lucy's Kitchen"); current
work extends this to include moving
elements, live actors, and the mixture
of still photographs with movie
footage.
89. Casual Collaboration
(Andrew Lippman and Judith Donath)
Video images are used to create a
visual and interactive representation
of an on-line collaborative community.
A database format is developed which
permits the modification and reuse of
the basic images to represent changing
events in the visualized community.
The research investigates perceptual
issues in synthesizing a coherent
scene from disparate parts, social
issues in the visual depiction of a
community, and technical issues in the
integration of live and processed
video.
90. Structure out of Sound
(Andrew Lippman, Professor Marvin
Minsky, and Michael Hawley)
In an information-rich environment
where data, images, and sound are
readily accessible and digitally
communicated, the issue of content-
based search becomes a necessity.
Structure out of Sound is the first
attempt at a unified analysis tool for
speech, music, and sound effects.
Movies are analyzed into sonic
primitives that allow one to divide a
movie into dialogue and action or to
identify the presence of a single
actor. The initial work, a doctoral
thesis, lays out the groundwork for
later addition of visual browsing and
correlating elements.
91. Hyperinstruments
(Professor Tod Machover)
Hyperinstruments is a project which
attempts to define and produce what we
consider to be the models for musical
instruments of the future. These
prototypes combine new definitions of
musical virtuosity with intelligent
machine understanding and music
structure generation. Efforts
continued during the past year to turn
our "HyperLISP" environment into a
general research tool, one which is
currently employed by various
researchers at the Media Lab and at
various other centers and
institutions. Work on the automated
music generation and analysis system
Cypher was completed, and is the
subject of a book to be published soon
by the MIT Press. Various music
cognition studies into phenomenon such
as beat and phrase tracking have
yielded intelligent algorithms which
are being incorporated into our
systems. Research is continuing on
turning physical gesture (notably a
conductor's left-hand articulations)
into real-time control signals, using
specifically designed hand-tracking
technology. Special emphasis has been
placed on the physical and sonic
detection of existing acoustic musical
instruments, most notably stringed
instruments, including joint-angle
movement sensing, finger-position
sensing, bow-position sensing, and
special digital signal processing
techniques for pitch, timbre, and
phrase analysis, including some using
synchronized dot patterns. Several new
musical compositions, including one
for the cellist Yo-Yo Ma, have been
produced and performed using our
hyperinstrument techniques.